Stability-Based Comparison of Class Discovery Methods for DNA Copy Number Profiles
نویسندگان
چکیده
MOTIVATION Array-CGH can be used to determine DNA copy number, imbalances in which are a fundamental factor in the genesis and progression of tumors. The discovery of classes with similar patterns of array-CGH profiles therefore adds to our understanding of cancer and the treatment of patients. Various input data representations for array-CGH, dissimilarity measures between tumor samples and clustering algorithms may be used for this purpose. The choice between procedures is often difficult. An evaluation procedure is therefore required to select the best class discovery method (combination of one input data representation, one dissimilarity measure and one clustering algorithm) for array-CGH. Robustness of the resulting classes is a common requirement, but no stability-based comparison of class discovery methods for array-CGH profiles has ever been reported. RESULTS We applied several class discovery methods and evaluated the stability of their solutions, with a modified version of Bertoni's [Formula: see text]-based test [1]. Our version relaxes the assumption of independency required by original Bertoni's [Formula: see text]-based test. We conclude that Minimal Regions of alteration (a concept introduced by [2]) for input data representation, sim [3] or agree [4] for dissimilarity measure and the use of average group distance in the clustering algorithm produce the most robust classes of array-CGH profiles. AVAILABILITY The software is available from http://bioinfo.curie.fr/projects/cgh-clustering. It has also been partly integrated into "Visualization and analysis of array-CGH"(VAMP)[5]. The data sets used are publicly available from ACTuDB [6].
منابع مشابه
I-37: Establishing High Resolution Genomic Profiles of Single Cells Using Microarray and Next-Generation Sequencing Technologies
The nature and pace of genome mutation is largely unknown. Standard methods to investigate DNA-mutation rely on arraying or sequencing DNA from a population of cells, hence the genetic composition of individual cells is lost and de novo mutation in cell(s) is concealed within the bulk signal. We developed methods based on (SNP-) arraying and next-generation sequencing of single-cell whole-genom...
متن کاملAssessment of mitochondrial DNA copy number in peripheral blood leukocyte of opiate abusers and healthy individuals
Background: Based on the studies, variation in the mitochondrial DNA (mtDNA) copy number in peripheral blood leukocytes is associated with increased susceptibility to diseases including cancer. Opiate abusers are at high risk for diseases. In this study, we measured the mtDNA copy number in peripheral blood leukocytes in a group of opiate abusers compared with those in healthy individuals. Met...
متن کاملPerformance evaluation of DNA copy number segmentation methods
A number of bioinformatic or biostatistical methods are available for analyzing DNA copy number profiles measured from microarray or sequencing technologies. In the absence of rich enough gold standard data sets, the performance of these methods is generally assessed using unrealistic simulation studies, or based on small real data analyses. To make an objective and reproducible performance ass...
متن کاملBayesian Disease Classification Using Copy Number Data
DNA copy number variations (CNVs) have been shown to be associated with cancer development and progression. The detection of these CNVs has the potential to impact the basic knowledge and treatment of many types of cancers, and can play a role in the discovery and development of molecular-based personalized cancer therapies. One of the most common types of high-resolution chromosomal microarray...
متن کاملStatistical Methods for High-Throughput Biological Data
The explosion in DNA microarray technology in the last decade has given rise to extensive biological data in the form of expression profiles of tens of thousands of genes and proteins, often from only a handful of tissue samples. The principal objective of a high-throughput experiment can be generally characterized as one of class comparison, class prediction or molecular pattern discovery. Cla...
متن کامل